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The idea of rare event sampling by Multicanonical Monte Carlo is applied to the estimation 
of the performance of error-correcting codes. The essence of the idea is importance sampling 
of the pattern of noises in the channel by Multicanonical Monte Carlo, which enables efficient 
estimation of tails of the distribution of bit errors. The proposed method is successfully tested 
with a convolutional code. 
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Dynamic Monte Carlo (Markov chain Monte Carlo, 
MCMC) algorithm is discovered in physics in 1950s and 
introduced to statistical data analysis in 1980s, now be- 
ing recognized as an essential methodology in both fields. 
But the use of MCMC need not be restricted to these two 
fields. In fact, it is a general strategy for sampling from 
complicated distributions with unknown normalization 
constants, and there will be a number of potential appli- 
cations in other fields. 

In this note, we discuss a problem of estimating the dis- 
tribution of bit errors of error-correcting codes^"^ for a 
given channel and a given input distribution. The essence 
of the proposed idea is importance sampling of the pat- 
tern of noises in the channel by MCMC . By using the 
proposed strategy, the tails of the error distribution are 
efficiently calculated. We also discuss that the use of 
Multicanonical Monte Carlo algorithm,"*"^ a version of 
MCMC which utilizes iterative construction of the sam- 
pling weight, is ideally suited to the problem. 

Note that efficient generation of unusual events is im- 
portant not only for checking theories but also for prac- 
tical purposes, because "rare events" become no more 
rare with some deviation from idealized models, such as 
unexpected correlation between noises. 

Our proposal is on the use of MCMC for the evalua- 
tion of characteristics of codes and channels. Thus, it is 
essentially different from any idea on the use of MCMC 
for decoding messages. An example of the use of MCMC 
and related algorithm in this field is found in the ref- 
erences,^'^ which estimate channel capacity by Blahut- 
Arimoto algorithm. 

The proposed approach is closely related to the idea of 
Hartmann,^" who studied large deviations of the output 
of a sequence alignment algorithm by a MCMC sam- 
pling. Similar approaches based on a MCMC sampling 
of quenched disorder is also used for estimating tails of 
the distributions of the ground state energy of physical 
systems^^'^^ and exploring finite temperature property 
of random mag nets.^'i-* The idea of using MCMC as a 
tool for sampling rare events that reduce the worst-case 
efficiency of an algorithm seems to have a wide range of 
applications. 

Let us introduce a problem of estimating the perfor- 
mance of error-correcting codes. Fix a coding, a chan- 



nel, and a decoding algorithm, and denote an input mes- 
sage and its distribution by x = {xi} and P{^)- The 
encoded message, an output of the noisy channel and 
its distribution is represented as z = {zi}, y = {yi\ 
and P(y|x), respectively. The message decoded by the 
given method from y is given by T{y). Here any kind of 
decoding algorithm is allowed such as Viterbi-decoding, 
belief-propagation, and loopy-belief-propagation, as long 
as T(y) is stable and can be regarded as a deterministic 
decoder. Assuming the distance c?(x, r(y)) between the 
input x and output T(y), we consider the problem of 
calculating the probability distribution 

P{d) d(x, T(y)))P(x)P(y|x) (1) 

of the distance d between inputs and outputs. In most 
familiar cases, d is Hamming distance and P{d) is the 
distribution of bit errors with a given distribution P(x) 
of inputs. 

A naive method for estimating P(d) is repeating the 
following procedures (1-3) independently until desired 
accuracy is attained: (1) A message x is generated from 
the distribution P(x). (2) The output y of the channel 
is sampled from P(y|x). (3) The distance d{x,T{y)) is 
calculated and recorded. This method is straightforward, 
but becomes highly computationally expensive when we 
are interested in the tails of the distribution of P{d), 
which correspond to rare events or large deviations under 
the assumption of the distribution P(x) of inputs. 

The proposed method, which largely improves the ef- 
ficiency of the estimation of the tails of P{d), is multi- 
canonical sampling^"^ with a weight wm (x, y) approx- 
imately proportional to P((i(x, T(y)))~^P(x)P(x|y). 
Here P{d{x,T{y))) is defined by the expression where d 
of P(d) is substituted for d(x, T(y)). From the definition 
(1) of P(d), it is shown that the marginal distribution 
P*{d) of bit errors d with this weight becomes nearly 
flat on the interval on which P(d) ^ 0, i.e., 

P*(d) ~ 

cj:.^^S(d-d{x,T{y)))P{d{x,T{y)))-'P{x)P{x\y) 
= c P(d)-i Ex.y Sid - d(x, r(y)))P(x)P(x|y) = c, 
where c is a constant. This enables both efficient sam- 
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pling of the tails of the distribution P{d) and fast mixing 
of the Markov chain used for sampling. In this generic 
form, the method contains sampling of both x and y 
and looks somewhat complicated. But in the examples 
discussed below, it becomes simpler and reduces to the 
original idea of sampling pattern of the noise in the chan- 
nel. 

How can we realize such a sampling and fit the results 
of the sampling into the original problem? If we define 
the function Puid) by 

t«M(x,y) =i^(d(x,T(y)))-ip(x)P(x|y), (2) 

the choice of weight WM(x,y) reduces to the estimation 
of Puid) that realizes almost flat marginal distribution 
P*{d). Then Puid) can be estimated by repeated pre- 
liminary runs of the simulation, just in the same way 
as the estimation of the weight in a conventional multi- 
canonical algorithm. While any method in literature of 
multicanonical or Wang-Landau algorithm can be used 
for the estimation of the weight, in the following exam- 
ple we use a naive method with a histogram construction 
(entropic sampling).®'^ Once we obtain Puid) that gives 
a sufficiently flat distribution P*{d), a reconstruction of 
the target distribution P{d) is given by P*{d)PM{d) with 
a suitable normalization constant. 

Here we test the proposed method with a convolu- 
tional code, whose codewords are z^^^ = XiXi+2 and 
z^^ = XiXi+iXi+2- A binary symmetric channel (BSC) 
is assumed and a Vitcrbi decoder is used as T{y). In 
this example, the gauge invariance'* considerably simpli- 
fies the algorithm. In particular, we can fix the input x to 
an arbitrary bit sequence such asxo = lllllll-- - and 
the sampling of x and y reduces to the sampling of the 
pattern y of noises in the channel. Then the expression 
(2) becomes 

wuiy) = PM(d(xo,r(y)))-ip(xo|y), 

but the proposed algorithm can be applied with some ob- 
vious modifications. In the following example, the length 
of original message and encoded message is 200 and 
(200 — 2) • 2 = 396 respectively, and the probability of 
bit flip is set to 0.1. 20 iterations of preliminary runs are 
performed for the tuning of the weight and the final run 
is used for the calculation of the results. 

Fig.l gives an example of the convergence of algorithm, 
where the estimated bit error probabilities after the 4th, 
12th, and 20th iteration are shown by solid circles (•), 
triangles (A), and open circles (o), respectively. The 
horizontal part of each curve indicates that no sample is 
obtained in the region. 

In Fig. 2, probabilities estimated in the 20th iteration 
are compared with the one by the naive method based on 
uniform random sampling. The symbol (•) corresponds 
to the result by the proposed method, where total of 
the preliminary and measurement runs requires about 
8 X 10^ Viterbi decoding. The symbol (+) corresponds 
to the result by the naive method with 8.08 x 10* Viterbi 
decoding. The results by the naive method are not shown 
in the region where the method dose not give a sample. 

The proposed method gives the result in Fig. 2 within a 
day of computation by a current personal computer and 



enables sampling from the right tail of the probability 
distribution P{d) where the naive method can hardly 
realize. On the other hand, the result by the proposed 
method (•) agrees with the one by the naive method 
(+) based on uniform random sampling in the range of 
higher probabilities, which supports the validity of the 
proposed method. 
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Fig. 1. An example of convergence of the proposed algorithm. 
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Fig. 2. Comparison of the estimates of P{d) calculated by the 
proposed method (•) and the naive method {+). Binary sym- 
metric channel. 

In the case of binary symmetric channel, the proba- 
bility of bit flips is given and there is a small but flnite 
possibility of flipping arbitrary number of bits up to the 
length of the encoded message. Then it is natural to ex- 
pect that the right tail of the distribution P{d) corre- 
sponds to larger number of flipped bits in the channel. 
This tendency is illustrated in Fig. 3, where the average 
of the numbers of flipped bits conditioned with a given 
value of bit errors d is plotted as a function of d. 
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Fig. 3. Average number of flipped bits in the channel as a func- 
tion of the number d of bit errors. 

From this viewpoint, it will be more interesting to treat 
a channel of fixed number of flipped bits, instead of bi- 
nary symmetric channel. It is equivalent to the sampling 
of the positions of flipped bits under the condition that 
the total number of flipped bits is given. An advantage 
of the proposed method is that we can easily adapt it to 
this kind of modification. In the present case, sampling 
from the channel of fixed number of flipped bits is sim- 
ply realized by the introduction of Metropolis move of 
swapping the positions of a flipped bit and a conserved 
bit. 

In Fig. 4, a result by the proposed method in the case 
of fixed number of flipped bits is shown. The length of 
the original message and encoded message is the same 
as the one of Fig.l-Fig.3, and the number of flipped bits 
is set to 40. The symbol (•) corresponds to the esti- 
mated probabilities by the proposed method. Prelimi- 
nary runs with 9 iterations arc performed for the tuning 
of the weight and the 10th run is used for the calcula- 
tion of the results. Total of these runs requires about 
7.3 X 10* Vitcrbi decoding. The symbol + corresponds to 
the result by the naive method with 1.1 x 10^ Viterbi 
decoding. The largest value 79 of the bit errors obtained 
by the proposed method seems to be the exact upper 
bound under the condition of Fig. 4, because it is stable 
against the increase of the weights for d > 79. In the tail 
region shown in Fig. 4, the probability is ~ 10"^*^, which 
can never be estimated by the naive method. 

In summary, we proposed an application of the idea of 
rare event sampling to the estimation of the performance 
of error-correcting codes. It is shown that a method based 
on multicanonical sampling of the pattern of noises gives 
an efficient way for sampling of the tails of the distribu- 
tion of bit errors with given distributions of the input 
and noise. 

A potential advantage of the proposed approach is that 
we can explicitly sample bit patterns of noises that give 



severe "damage" to encoded messages and cause large 
bit errors. It can be useful for the understanding of weak 
points of a given code. This idea of "weak point sam- 
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Fig. 4. Comparison of the estimates of P{d) calculated by the 
proposed method(») and the naive method(-h). Fixed number of 
flipped bits. 

pling" will be useful for wide range of problems. Research 
in this direction as well as applications to realistic codes 
and channels are left for future studies. 
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